learning generalizable device placement algorithm
Reviews: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
Originality: The use of graph neural networks appears novel (concurrent with Paliwal), as does the sweep order (for which I don't know other papers, at least for this application of graph neural networks). The trick of using architecture search as a dataset also seems novel, and I'm quite happy with this idea. Quality: The submission is sound, but I have a few minor concerns: 1. It's possible REINFORCE is good enough, but I'm skeptical given that (1) REINFORCE is much worse in normal RL environments and (2) the paper explicitly presents evidence that using an incremental baseline helps learning. The learned value function in PPO, Q-learning, etc. could potentially play the same variance reduction role or even do quite a lot better (presumably not all of the variance due to upstream moves is explained by reward so far).
Reviews: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
The paper introduces a new RL-based approach to device placement in computation graphs that relies on using graph embedding neural network instead of RNNs. The reviewers were all impressed by the novelty of the proposed approach, the significance of the empirical results, as well as by the ability of the method to generalize across different tasks. While preparing the final version, please take into account the detailed comments and suggestions mentioned in the reviews.
Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches.
Learning Generalizable Device Placement Algorithms for Distributed Machine Learning
addanki, ravichandra, Venkatakrishnan, Shaileshh Bojja, Gupta, Shreyan, Mao, Hongzi, Alizadeh, Mohammad
We present Placeto, a reinforcement learning (RL) approach to efficiently find device placements for distributed neural network training. Unlike prior approaches that only find a device placement for a specific computation graph, Placeto can learn generalizable device placement policies that can be applied to any graph. We propose two key ideas in our approach: (1) we represent the policy as performing iterative placement improvements, rather than outputting a placement in one shot; (2) we use graph embeddings to capture relevant information about the structure of the computation graph, without relying on node labels for indexing. These ideas allow Placeto to train efficiently and generalize to unseen graphs. Our experiments show that Placeto requires up to 6.1x fewer training steps to find placements that are on par with or better than the best placements found by prior approaches.